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Abstract 

Studies of reproductive isolation between homoploid hybrid species and their parent species have rarely been carried out. 
Here we investigate reproductive barriers between a recently recognized hybrid bird species, the Italian sparrow Passer 
italiae and its parent species, the house sparrow P. domesticus and Spanish sparrow P. hispaniolensis. Reproductive barriers 
can be difficult to study in hybrid species due to lack of geographical contact between taxa. However, the Italian sparrow 
lives parapatrically with the house sparrow and both sympatrically and parapatrically with the Spanish sparrow. Through 
whole-transcriptome sequencing of six individuals of each of the two parent species we identified a set of putatively parent 
species-diagnostic single nucleotide polymorphism (SNP) markers. After filtering for coverage, genotyping success (>97%) 
and multiple SNPs per gene, we retained 86 species-informative, genie, nuclear and mitochondrial SNP markers from 84 
genes for analysis of 612 male individuals. We show that a disproportionately large number of sex-linked genes, as well as 
the mitochondria and nuclear genes with mitochondrial function, exhibit sharp dines at the boundaries between the hybrid 
and the parent species, suggesting a role for mito-nuclear and sex-linked incompatibilities in forming reproductive barriers. 
We suggest that genomic conflict via interactions between mitochondria and sex-linked genes with mitochondrial function 
("mother's curse") at one boundary and centromeric drive at the other may best explain our findings. Hybrid speciation in 
the Italian sparrow may therefore be influenced by mechanisms similar to those involved in non-hybrid speciation, but with 
the formation of two geographically separated species boundaries instead of one. Spanish sparrow alleles at some loci have 
spread north to form reproductive barriers with house sparrows, while house sparrow alleles at different loci, including 
some on the same chromosome, have spread in the opposite direction to form barriers against Spanish sparrows. 
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Introduction 

Hybridization between divergent populations has diverse 
impacts on evolution [1-3], including the rapid formation of 
hybrid species [1-6]. Homoploid hybrid speciation (HHS) is the 
process through which hybridization between two taxa results in a 
third, novel taxon that remains distinct by means of reproductive 
barriers against both parent taxa, without a change in number of 
chromosome sets. This mode of speciation is thought to be rare in 
nature as hybridization must be initiated by gene exchange 
between two taxa, but this gene exchange also subsequendy 
reduces the likelihood of hybrid speciation occurring. Gene flow 
from the parents must be countered or reduced after initial contact 
despite complementary ploidy levels and weak initial isolation 
[2,5]. 

In non-hybrid speciation involving species with chromosomal 
sex-determination, sex-linked genes have repeatedly been found to 
strongly influence reproductive isolation (RI) [7-10]. The prom- 
inent role of sex chromosomes as reproductive barriers between 
non-hybrid species is attributed to a fast rate of genetic divergence, 



exposure of recessive alleles to selection in the heterogametic sex, a 
predominance of genes with sexual functions, and high average 
linkage between the disproportionately many sex-linked genes 
involved in RI [7,8]. However, an equally important role for sex 
chromosomes in HHS is far from setded. One of the most likely 
mechanisms for HHS is thought to be through transgressive 
segregation: the production of trait values outside the range of 
both parent taxa in the hybrids, allowing adaptation to ecological 
niches unavailable to those parent taxa [5,1 1]. Whereas divergent 
selection on reproductive traits is expected to be heavily involved 
in non-hybrid speciation, increasing the influence of sex chromo- 
somes, transgressive ecological adaptations are more likely to be 
autosomal and evolve under stabilizing selection in parents [8, 1 1— 
13]. Divergent selection produces extreme phenotypes when it 
leads to parent taxa being fixed for alleles with opposite effects at 
each locus. Hence in this situation, additive genetic variation leads 
to intermediate hybrid trait values [11,13]. In contrast, divergence 
under stabilizing selection occurs through weakly selected turnover 
of alleles contributing to a trait with an intermediate optimum. 
This promotes divergence on autosomes [12], and leads to a 
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Author Summary 

Hybridization between two species has the potential to 
create a third, hybrid species. However this process, known 
as hybrid speciation, is thought to be unlikely because it 
requires reproductive barriers against both parent species 
to develop despite the barriers between parents being 
weak enough to allow for the formation of viable, fertile 
hybrids. The Italian sparrow, which occupies the entire 
Italian peninsula and some Mediterranean islands, is the 
product of past hybridization between house and Spanish 
sparrows and therefore represents one of the few 
documented cases of vertebrate hybrid speciation in 
nature. We show that reproductive barriers between 
Italian sparrows and their parent species exist and that 
genes on the sex (Z) chromosome and mitochondria are 
heavily involved. We suggest that speciation in this system 
may have been driven by dissociation of the sex (Z) 
chromosome into blocks of different parent species- 
specific genes, which have shifted alongside mitochondrial 
genes to form reproductive barriers where the hybrid now 
meets each of its parent species. 

mixture of loci fixed for alleles with both positive and negative 
effects on trait values in each taxon. Thus, through additive effects 
alone, hybrids with more positive- or negative-effect alleles than in 
either parent taxon will often be produced, leading to transgressive 
phenotypes [1 1]. 

The study of RI in hybrid species systems can be complicated by 
a lack of geographical overlap between the hybrid and one or both 
of its parent species [5]. In Passer sparrows, however, the 
distribution of the hybrid Italian sparrow Passer italiae [14,15] 
overlaps with those of both its parent species, the Spanish sparrow 
P. hispaniolensis and the house sparrow P. domesticus (Figure 1) [14] 
allowing for the study of reproductive barriers. The Italian 
sparrow is in contact with the house sparrow in a stable, narrow 
hybrid zone in the Alps [14,16] and with the Spanish sparrow in a 
recently established sympatric zone in southeast Italy. In Sardinia, 
off the west coast of Italy, Spanish sparrows occur allopatrically 
(Figure 1). House and Spanish sparrows are themselves broadly, 
and often locally, sympatric across the entire Spanish sparrow 
range, remaining phenotypically distinct in all but a few locations 
[16]. Hence reproductive barriers exist and are typically effective 
in maintaining isolation between the parent species, but can be 
broken down to form viable hybrid populations and species. 

Previous studies have indicated that Italian sparrows are almost 
fixed for house sparrow mitochondrial DNA [14,15]. Moreover, 
two markers on the Z chromosome (birds are female-heteroga- 
metic with a ZZ/ZW sex chromosome system) were found to be 
fixed for the Spanish sparrow allele in one case (CHD1J?) and 
nearly fixed for the house sparrow allele in the other (PLAA), 
indicating strong mosaicism on the Z chromosome not paralleled 
by autosomal markers [15]. 

The evidence of Z chromosome mosaicism from existing studies 
[15] may indicate that Italian sparrow isolating mechanisms are 
more similar to those involved in non-hybrid speciation than 
would be expected given a strong influence of transgressive 
segregation. Furthermore, CHDl^ has been shown to be under 
divergent selection and associated with RI between several non- 
hybrid bird species pairs [15 and references therein]. For example, 
CHDl^ shows evidence of divergent selection in sympatric but not 
allopatric population comparisons of pied and collared flycatchers 
(Ficedula hypokuca and F. albicolMs) [17]. Should CHDl^prove to be 
an informative marker in Italian sparrows, it may be particularly 



likely to represent a functional variant directiy involved in RI (or at 
least a linked marker within the same gene) rather than a linked 
marker within a neutral gene. 

Evidence is accumulating that mito-nuclear interactions cause 
postzygotic isolation and are influential in speciation [18,19]. 
Mitochondrial DNA also commonly shows strongly shifted clines 
relative to nuclear markers, and this is often attributed to adaptive 
introgression [20]. Such introgression can occur in the face of 
detrimental effects on males, a phenomenon known as "mother's 
curse" [21,22]: selection in males has no direct effect on 
mitochondrial fitness due to maternal inheritance, causing a 
selective sieve allowing the excessive build-up of male-detrimental 
mutations. Hence, if "mother's curse" is acting in the Italian 
sparrow we would expect male-compensatory alleles to track the 
spread of house sparrow mitochondria and show concordant 
clines. 

Here, we analyze species-informative single nucleotide poly- 
morphism (SNP) markers, located within functional transcribed 
genes, across the breeding range of the Italian sparrow, as well as 
the contact zones with its parent species. We use a cline analysis 
approach [23] to identify candidate hybrid-parent RI genes, and 
hence to elucidate the mechanisms involved in HHS. In particular, 
we look for evidence of coincidence between nuclear and 
mitochondrial clines and discuss the possibility that they represent 
the outcome of mother's curse in a hybrid species, and whether 
this mechanism may be influential in hybrid speciation. We test 
whether markers with clines falling on current hybrid-parent 
species boundaries are disproportionately (i) Z-linked, thus 
showing similarities with non-hybrid speciation, or (ii) autosomal, 
suggesting differences in mechanisms of hybrid speciation relative 
to non-hybrid speciation and a greater influence of stabilizing 
selection and transgressive segregation. 

Results 

We identified putatively parent species-diagnostic SNP markers 
through transcriptome sequencing of six individuals of each of the 
two parent species. After filtering for coverage, sufficient flanking 
sequence, genotyping success (>97%) and multiple SNPs per 
gene, we retained 86 species-informative, genie, nuclear and 
mitochondrial SNP markers from 84 genes for analysis. Using this 
marker set, we found the Italian sparrow to exhibit high levels of 
genomic admixture over the entire study area (Figure 1 and Table 
SI). We also found evidence for on-going but restricted gene 
exchange between Italian sparrows and house sparrows in the 
contact zone in the Alps (Figure SI), though no evidence for gene 
exchange between Italian and Spanish sparrows in the sympatric 
zone in southeast Italy (see below). However, STRUCTURE [24] 
analysis revealed evidence of migration between Italian sparrow 
populations on mainland Italy and Spanish sparrow populations 
on Sardinia. Early generation migrants were present in both 
locations indicating ongoing gene flow through dispersal events 
(Figure 1). 

As gene flow was observed between the Italian sparrow and 
both parent species, we implemented a cline analysis framework to 
look for genes exhibiting steep clines, and therefore decreased gene 
flow at the species boundaries. The SNPs are within functional 
coding genes and hence any such clines may indicate a direct 
influence of the gene on RI. They may also, however, be neutral 
but closely linked to loci under selection. These genes nevertheless 
represent the most likely candidates to be involved in RI at this 
inferential stage of analysis. 

Cline analysis is a method used to measure the steepness, shape 
and location of changes in allele frequency or locus-specific 
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Figure 1 . Phenotypic and genetic makeup of the hybrid Italian sparrow. Coloration of the map denotes phenotypic distribution as indicated 
by the bird drawings to the right of the map (blue: house sparrow, turquoise: Italian-house hybrids, yellow: typical Italian sparrow, orange: Italian 
sparrows with plumage intermediate between typical Italian and Spanish sparrows, red: Spanish sparrow). Bird drawings indicate species-specific 
male plumage characteristics of the three taxa [16]. Pie charts denote mean hybrid index at sampling localities where white and black color indicate 
house and Spanish sparrow genetic contribution, respectively. Locations with evidence of recent gene exchange between Spanish and Italian 
sparrows are indicated by arrows. 
doi:1 0.1 371 /journal.pgen.1 004075.g001 



ancestry, as well as in quantitative traits [23,25]. It is typically used 
to examine geographic clines where, given the assumption that the 
cline is maintained by a balance between dispersal into a hybrid 
zone and selection against hybrids [25-27], various parameters 
including the strength of selection acting on traits or loci can be 
estimated. Cline analysis is therefore useful for identifying loci 
involved in RI in hybrid zones [23]. However, many contact zones 
do not conform to the assumption of a dispersal/ selection balance 
and show a more complex pattern of contact and changes in locus- 
specific ancestry or allele frequency. This has led to the emergence 
of genomic cline analysis, in which geographic distance is replaced 
by a 'hybrid index', and cline width and location represent the 
amount and bias of introgression at a locus into the foreign 
genomic background [23,28-30]. With the caveat that use of 
genomic clines does not fully remove the influences of genetic drift 
and geographic structure alongside selection on introgression, 
these analyses can be employed on any geographic pattern of 
contact and so are amenable for use in studies of hybrid speciation. 

Bayesian genomic cline analysis (BGC) [29,31] fits the Barton 
cline and estimates the parameters ot (excess of house or Spanish 
alleles) and P (rate or steepness of cline), analogous to geographic 
cline center and width respectively [28,30]. As there was some 



variation between runs, our BGC analysis revealed 31-35 genes to 
have excess house sparrow ancestry while 18-22 genes exhibited 
excess Spanish sparrow ancestry (Figure 2, Figure 3, and Figure 
S2). Furthermore, 25-27 genes exhibited steeper clines than 
neutral expectations (Figure 3, Figure S2) while 14—16 genes 
exhibited clines shallower than neutral expectations. Of the 25-27 
genes with steeper clines than neutral expectations, 10 exhibited 
excess house sparrow ancestry, and another 5-8 exhibited excess 
Spanish sparrow ancestry. The remaining 8-10 genes exhibiting 
steep clines were not significantly shifted in either parental 
direction (Figure S2). 

Combined BGC and geographical analysis using Geneland 
[32,33] further revealed seven genes to exhibit abrupt allele 
frequency shifts and thus steep clines at the hybrid-parent range 
boundaries (Figure 2 and Figure 3, Table S2). Of these seven 
genes, five (i.e. 71.4%) were Z-linked (Figure 2 and Figure 3). 
Three of these Z-linked genes shifted at the Italian-Spanish 
boundary (Figure 2 and Figure S2) alongside mitochondrial JVD2, 
whereas clines in two Z-linked and one autosomal gene were 
located at the Italian-house boundary (Figure 2, Figure 3 and 
Figure S3). These results indicate a mosaic pattern of introgression 
along the Z chromosome, as predicted by previous results [15]. 
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A Southern dines B z C Northern dines 




Figure 2. Genetic incompatibilities between the hybrid Italian sparrow and its parent species. (A) Representative geographic dine (ND2) 
for the mitochondrion and Z-linked genes shifting significantly between the Italian and Spanish sparrows of mainland Italy/Sicily and Sardinia. Colors 
refer to posterior likelihood of belonging to group corresponding to the Italian sparrow (>0.9, no color) relative to the Spanish sparrow (<0.1, red). 
The numbers refer to three transects through the Italian-house sparrow hybrid zone in the Alps. Black dots denote sampling locations. (B) Genomic 
location (in zebra finch) of genes inferred to be involved in hybrid-parent reproductive isolation. Blue outlines denote genes shifting significantly 
between the Italian and house sparrow, and red outlines denote genes shifting significantly between the Italian and Spanish sparrow. Markers 
highlighted in yellow have significantly steeper dines (significant [3) than the neutral expectation according to a BGC-analysis (see main text) in 
addition to being significantly skewed towards either hybrid-parent species boundary (significant a), and hence represent the strongest candidate Rl 
genes. Markers in white have significant a only. Chromosomal location for the Z-linked and Chr. 4A genes are indicated. (C) Geographic dines along 
transect 2 for the three genes shifting significantly in the Italian-house sparrow hybrid zone in the Alps. Upper panel shows results from the Z-linked 
genes CHD1Z/CETN3, lower panel shows results for the autosomal gene RPS4. Colors refer to posterior likelihood of belonging to group 
corresponding to the house sparrow (>0.9, white) relative to the Italian sparrow (<0.1, red). Black dots denote sampling locations. 
doi:1 0.1 371 /journal.pgen.1 004075.g002 



There was a significant overrepresentation of Z-linkage among 
the genes exhibiting steep clines at the species boundaries 
considering that the Z chromosome holds about 3-7% of the 
genome of birds [9,34,35] (One-tailed binomial test: null 
probability based on flycatcher genome = 0.066, successes = 5, 
trials = 7, P=2.35xl0~ 5 ). 

As observed in previous studies [14,15], we found Italian 
sparrows to be nearly fixed for house sparrow mitochondrial 
haplotypes (Figure 2 and Figure 3; Table S2). Moreover, two of 
the three Z-linked genes that exhibit steep clines at the Italian- 
Spanish boundary are classified as nuclear-encoded mitochondrial 
proteins (HSDL2 and MCCC2) [36,37]. This is a significant 
overrepresentation of mitochondrial function compared to 8.3% in 
chickens [34,37] (One-tailed binomial test: null probabili- 
ty =0.083, successes = 2, trials =3, P=0.02; null probability data 
from The Gene Ontology Project's Gene Association file for Gallus 
gallus, GOC validation date: 4 December 2012, gaf-version 2.0 
and The Gene Ontology Project's Gene Ontology file, date: 4 
December 2012, cvs revision version 4708) among the nuclear 
genes shifting at this boundary. The Z-linked gene HSDL2 
exhibited a near-identical pattern of fixation for house sparrow 
alleles in the Italian sparrow as the mitochondrial marker ND2 
(Figure 3; Table S2). 

In three transects through the hybrid zone in the Alps, the Z- 
linked markers CETN3 and CHDl^ and autosomal marker RPS4 
exhibited the steepest clines (Figure 2 and Figure 3). Unlike in 



other bird species [9,34,35], CHDl^ and CETM3 appear to be 
tightly linked in sparrows (Figure S4). Outlier analyses indicated 
that CHD1£ but not CETN3 is a candidate for being under 
divergent selection (Figure S5). 

The three markers MCCC2, GTF2H2 and HSDL2 also show 
evidence of statistical association, although HSDL2 is predicted to 
be a long physical distance from the other two based on the zebra 
finch genome (Figure S4). A conservative estimate can therefore be 
made that two out of four sets of markers {MD2, RPS4, one from 
CHD1Z/CETN3 and one from HSDL2/MCCC2/ GTF2H2) with 
steep clines on range boundaries are Z-linked. This remains a 
significant overrepresentation of Z-linked markers (One-tailed 
binomial test: null probability = 0.066, successes = 2, trials = 4, 
P=0.02). 

While the steepest genomic clines were found for markers with 
major geographic clines at the Italian sparrow range boundaries 
(Figure 2 and Figure 3; Figure S2, Table S2), clines steeper than 
neutral expectations were also found within the Italian sparrow's 
range in a number of both autosomal and Z-linked genes (Figure 3c 
and Figure S2), some of which were also significandy shifted 
(significant oc; Figure 3 a-c, Figure S2). Seven other markers were 
strongly shifted towards an excess of Spanish sparrow alleles, but 
had clines much shallower than the rest (Figure 3 a-c). Not all of 
the oc and P estimates for these markers were significant however, 
and the parental allele frequency difference was <0.5 in every 
case. With such a low parental allele frequency difference and 
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Figure 3. Candidate Rl genes revealed by genomic dine analysis. (A-B) Markers are ordered along chromosomes as in Table S3. (A) BGC run 2 

estimates of genomic cline center (a) with 95% credibility intervals for all 86 SNP markers. Red dots indicate markers with significant excess house 
sparrow ancestry, blue dots indicate markers with significant excess Spanish sparrow ancestry, grey dots indicate markers with either significant 
house or Spanish excess ancestry but where the difference in allele frequency between the parent species are below 0.5, and black dots indicate 
markers that do not differ from neutral expectations. (B) BGC run 2 estimates of genomic cline rate (p) with 95% credibility intervals for all 86 SNP 
markers. Orange dots indicate markers with dines significantly steeper than neutral expectations, white dots indicate markers with significantly 
shallower dines than neutral expectations, grey dots indicate markers with either significantly steeper or shallower dines than neutral expectations 
but where the difference in allele frequency between the parent species is below 0.5, and black dots indicate markers that do not differ from neutral 
expectations. (C) Genomic cline center (a) plotted against genomic cline rate ((3). Red dots indicate markers that exhibit significant excess house 
sparrow (P. d.) ancestry, dines steeper than neutral expectations and where the difference in allele frequency between the parent species is greater 
than 0.5. The red dots that are encircled and named shift at the Italian-Spanish boundary and are hence candidate Rl genes. Blue dots indicate 
markers that exhibit significant excess Spanish sparrow (P. h.) ancestry, dines steeper than neutral expectations and where the difference in allele 
frequency between the parent species is greater than 0.5. The blue dots that are encircled and named shift at the Italian-house boundary and are 
hence candidate Rl genes. Green dots indicate markers that have allele frequency differences between the parent species greater than 0.5 and dines 
steeper than neutral expectations but do not exhibit excess house or Spanish sparrow ancestry. These markers are candidates for being 
incompatibilities within the Italian sparrow. Grey dots indicate markers in which the allele frequency difference between the parent species is less 
than 0.5. (D). Examples of BGC genomic dines representative of the marker categories described in panel (C). For illustrations of all dines, see Figure 
S2. 

doi:1 0.1 371 /journal.pgen.1 004075.g003 



significantly shallow rather than steep clines, we do not consider 
these as potential Rl genes. However, this does not rule out other 
forms of selection on these loci within the hybrid species. 

Whereas Sardinian Spanish sparrows show evidence of on- 
going introgression from Italian sparrows, Spanish sparrows in the 
recently established sympatric population in southeast Italy appear 
to be genetically pure (Table SI, 'Lesina (Spanish)'). We found no 
difference in F ST between these Spanish sparrows and sympatric 
Italian sparrows versus Spanish sparrows and nearby allopatric 
Italian sparrows (Table 1). Thus, there was no sign of gene flow in 
sympatry between Italian and Spanish sparrows. 

Discussion 

Reproductive barriers between the hybrid Italian sparrow 
and its parent taxa 

In this study, we utilized the Italian sparrow's gene exchange 
and geographical overlap with both parent species, house and 
Spanish sparrows, to investigate hybrid-parent reproductive 
barriers using a cline analysis framework. Results of geographic 
and genomic cline analyses reveal that several markers with an 
excess of Spanish sparrow alleles within Italian sparrows are 
associated with Rl at the house-Italian range boundary, and that 
several markers with house sparrow excess are associated with Rl 
at the Spanish-Italian range boundary. A disproportionately high 
number of the markers showing steep clines at one or the other 
hybrid-parent species boundary are Z-linked, including when 
potential physical linkage is accounted for. Of these SNPs, which 
are all within functioning transcribed genes, the strongest 
candidates to be within genes directly involved in Rl are those 



that exhibit the most extreme cline parameters. This includes 
CHD1£ (Z chromosome) and RPS4 (chromosome 4A) at the 
Italian-house sparrow boundary in the Alps, and HSDL2 (Z 
chromosome) and JVD2 (mitochondria) at the Italian-Spanish 
sparrow boundary between the Italian mainland and Sardinia. 
Furthermore, two of the three nuclear markers with steep clines 
coincident with that of the mitochondria at the Italian-Spanish 
boundary [HSDL2 and MCCC2) axe classified as nuclear-encoded 
mitochondrial proteins [36,37]; a statistical overrepresentation. As 
oudier loci revealed by cline analysis may result from genetic drift 
rather than from selection on introgression [29], we note that a 
significant overrepresentation of sex-linkage and mitochondrial 
function among candidate Rl genes as reported here is not 
expected to arise through drift. 

Excessive sex linkage of Rl supports the hypothesis that HHS in 
Italian sparrows is facilitated by divergent selection between the 



Table 1. Test for gene exchange in sympatry. 









Lesina Spanish 


Guglionesi 


Lesina Italian 


Guglionesi 


0.4163»»» 






Lesina Italian 


0.4105*** 


0.0025 




Mass Montanari 


0.3926*** 


0.0073 


0.0041 





FST values between Gargano peninsula populations of Spanish sparrows and 
sympatric (Lesina) and nearby allopatric Italian sparrows. 
***indicates highly significant genotypic divergence, P<0.0001. 
doi:1 0.1 371 /journal.pgen.1 004075.t001 
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parent species. HHS differs from non-hybrid speciation in that an 
isolating mechanism is required against each parent. This may be 
aided by transgressive segregation leading to extreme hybrid 
phenotypes [5,11]; a process promoted more by stabilizing 
selection in the parent species than by divergent selection 
[11,13]. Stabilizing selection is more likely to produce transgressive 
hybrid phenotypes through purely additive effects due to the 
greater likelihood of complementary gene action [1 1], While traits 
under divergent selection may also produce transgressive pheno- 
types in some circumstances, for example with epistasis, the 
emphasis on transgression in HHS represents a contrast to theories 
of non-hybrid adaptive speciation, in which genes under divergent 
selection are thought more likely to contribute to isolation 
[38, 39]. The mechanisms promoting Italian sparrow HHS may 
therefore more closely resemble those involved in typical non- 
hybrid speciation. While transgression cannot be ruled out, Italian 
sparrows appear phenotypically intermediate between the parents 
and share the house sparrow's human-commensal niche. 

We recognize that the genes with steep clines may represent 
neutral markers in linkage disequilibrium with the RI genes under 
selection (this is likely to be the case for the mitochondrial ND2, 
which is in linkage disequilibrium with all other mitochondrial 
genes), rather than being directly involved in RI. In the case of 
RPS4, there is no a priori expectation that it should be involved in 
RI. However this gene, the sole autosomal representative 
associated with hybrid-parent isolation, is found on chromosome 
4A in zebra finch [35]. This chromosome is orthologous to the 
mammalian X chromosome and RPS4 is in fact X-linked in 
Eutherian mammals [40]. Chromosome 4A appears to be 
enriched for genes with properties similar to sex-linked genes, 
including the avian homolog to the human sex-determining SRT, 
and about one third of this chromosome has even translocated to 
the sex chromosomes in the whole avian superorder Sylvioidea 
[40]. Hence, the sole autosomal gene with a significantly steep 
cline on a species boundary may be linked to genes with properties 
more typical of sex-linked genes. 

We hypothesized that CHDl^ may direcdy influence RI 
because it has been previously highlighted as a candidate 
speciation gene in other bird systems [15 and references therein]. 
This therefore represents a stronger RI gene candidate than RPS4 
or CETN3 on the Italian-house boundary. Though CHD1Z and 
CETN3 appear be closely linked to the same RI locus, our oudier 
analysis indicates CHDl^ is under divergent selection while 
CETN3 is not, and is therefore a more likely candidate to be the RI 
locus. CHD1 (the generic name for this gene in all organisms, 
represented by divergent Z-linked and W-linked copies in birds) is 
a chromatin-remodeling factor potentially affecting the expression 
of many genes [41]. Of particular interest is its essential role in 
chromosome centromere localization [42] . "Centromeric drive" is 
a proposed mechanism of intra-genomic conflict potentially 
causing rapid evolution of incompatibilities and speciation [43]. 
In this process, male-detrimental centromere drivers causing 
biased meiosis lead to selection for compensatory mutations in 
centromeric proteins, potentially including CHD1. Centromeric 
drive has been proposed as an important mechanism in the non- 
hybrid speciation of pied and collared flycatchers [9], in which 
CHD1£ also shows evidence of involvement in RI [17]. This adds 
weight to the argument that CHDl^ represents a candidate RI 
gene, maintaining parapatric differentiation between hybrid 
Italian sparrows and the house sparrow. 

We also hypothesized a role for mito-nuclear interactions, and 
in particular the tracking of spreading mitochondrial variants by 
nuclear male 'restorer' genes, compensating for 'mother's curse'. 
The overrepresentation of nuclear genes with a mitochondrial 



function on the Italian-Spanish sparrow boundary provides some 
support for this hypothesis. However, only one of these genes, 
HSDL2, shows a cline almost as steep as mitochondrial .ND2. While 
this may be a linked neutral marker, the fact that HSDL2 , % protein 
product is located within mitochondria [37,44] supports its 
candidacy as an RI gene. HSDL2 is thought to be involved in 
fatty acid metabolism, although its exact functions are unknown 
[44]. 

Our results thus appear consistent with the influence of mito-sex 
chromosome conflict acting as a reproductive barrier at the 
Spanish-Italian sparrow boundary. Furthermore, due to the lack of 
global dosage compensation in birds, Z-linked genes typically have 
higher expression and - combined with the fact that Z 
chromosomes spend two thirds of their time in males - stronger 
fitness effects in males than females. Consequently, genes with 
male-specific fitness effects are overrepresented on the Z chromo- 
some [8,45]. We thus postulate that nuclear male-compensatory 
'restorer' genes are most likely to occur on the Z chromosome, 
leading to reduced fitness in hybrids with mito-sex chromosome 
mismatches. HSDL2 in fact has been shown to have higher 
expression levels in male than female chickens [46]. As an 
alternative to "mother's curse", isolation through co-adaptation 
between nuclear and mitochondrial genes is also possible. Because 
natural selection can only act on such bi-directional co-evolution 
through female fitness effects, we propose that mito-nuclear co- 
adaptation should involve disproportionately many autosomal 
genes, as they spend equal time in both sexes and show no overall 
sex bias in gene expression [45]. Hence our results are more 
consistent with mito-nuclear conflict. 

There is no evidence for hybridization between Spanish 
and Italian sparrows in the sympatric zone of southeast Italy, 
supporting previous results [14]. This suggests a role for prezygotic 
barriers between the two taxa. Spanish sparrows are much 
less associated with humans than house and Italian sparrows 
and occupy a different habitat, providing some habitat- 
dependent assortative mating [16]. We suggest that evolution 
of Italian sparrows towards the house sparrow human- 
commensal niche may have contributed to rapid development of 
prezygotic isolation with Spanish sparrows alongside, or even 
reinforced by, the aforementioned mito-nuclear postzygotic 
barrier. 

Reproductive barriers within the hybrid Italian sparrow 

In addition to the steep clines at the species range boundaries, 
some genes exhibited clines steeper than the neutral expectation 
within the Italian sparrow's range. One possible interpretation of 
this result is that moving clines of incompatibility genes may have 
become trapped by environmental transitions or population 
density troughs before reaching the current hybrid-parent 
boundaries. In this way, intraspecific incompatibilities within a 
hybrid species may increase future diversification relative to non- 
hybrid species, in particular through the effect of divergence 
hitchhiking in promoting the build-up of novel isolating mecha- 
nisms surrounding pre-existing incompatibilities [47,48]. Isolation 
by adaptation may be occurring in Italian sparrows [49], so 
moving clines of incompatibility genes may also have become 
trapped by association with niche differentiation [50]. Neverthe- 
less, neutral processes cannot be ruled out. The spatial spread of a 
partially reproductively isolated taxon into the range of the other 
taxon may lead to neutral allele frequency clines at historical 
invasion wave fronts [51], although these would become more 
diffuse over time since the spread. 

The Italian sparrow genome represents a mosaic, particularly 
on the Z chromosome, in which some Spanish sparrow alleles 
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have spread to form reproductive barriers against house sparrows, 
while house sparrow alleles at different loci on the same 
chromosome have spread in the opposite direction to form 
barriers against Spanish sparrows. We envisage that HHS in the 
Italian sparrow may match the 'mosaic genome hybrid speciation' 
model [52] with the addition of secondary spatial spread of the 
hybrid genotype. Such discordant spread would most likely occur 
through selective sweeps, if one parental allele had a fitness 
advantage over the other in the mosaic genomic background. 
However, if strong selective sweeps occurred, some mechanism 
would be needed to cause them to stop at the current hybrid- 
parent boundaries. We note that these boundaries lie on major 
barriers to dispersal in the case of the Passer sparrows, and 
that beneficial alleles often do not sweep across the whole range 
of a species for a variety of reasons including geographical 
barriers. 

Conclusions 

Using a cline analysis framework we have identified sets of 
candidate RI genes and genomic regions between the hybrid 
Italian sparrow and its parent species. These results support our 
predictions that mito-nuclear interactions and loci on the Z 
chromosome strongly influence RI. In this regard, we suggest that 
HHS in the Italian sparrow resembles non-hybrid speciation, and 
we would therefore predict that the same loci would be involved in 
RI between the parent taxa; house and Spanish sparrows. An 
important next step is therefore to replicate this study in a region 
of parental sympatry and hybridization [53] . 

Materials and Methods 

Sample collection 

Only males, which are diploid for the Z chromosome, were 
analyzed to avoid issues related to haplodiploidy of the Z 
chromosome. Blood samples from the three taxa (« = 612) were 
taken from 64 locations between 2007 and 2011 (Figure 1 and 
Table SI): Spanish sparrows («= 142) from Badajoz, Spain, 
Sardinia, Italy and a Spanish/Italian sympatric zone in southeast 
Italy; allopatric house sparrows (n = 85) from Hradec Kralove, 
Czech Republic and Oslo, Norway; Italian sparrows, Italian-house 
hybrids and parapatric house sparrows from the Italian peninsula 
and the Alps (n = 385). The sparrows were caught using mist nets. 
About 25 |J,1 of blood was extracted by venipuncture of a brachial 
vein and stored in 1 ml of Queens lysis buffer. Appropriate 
catching and sampling permits were obtained for all sampling 
locations from the relevant authorities. DNA was isolated using 
Qiagen DNeasy 96 Blood and Tissue Kits (Qiagen N.V., Venlo, 
Netherlands) according to the manufacturer's instructions. For 
transcriptome sequencing, three house (Oslo, Norway) and three 
Spanish (Badajoz, Spain) sparrows of each sex were sampled in 
October 2010. Liver, heart and brain tissue samples were taken 
and stored on RNAlater (100 mg tissue in 1400 ul buffer) 
according to the manufacturer's protocol. 

Transcriptomic cDNA library preparation and sequencing 

Total RNA isolation from pooled liver, heart and brain samples 
followed by normalized cDNA library preparation was performed 
by Vertis Biotechnologie AG, Freising, Germany. Total RNA was 
isolated from the cell powders using the mirVana RNA kit 
(Ambion) including an on-column DNase treatment. From the 
total RNA samples, poly(A)+ RNA was prepared and fragmented 
with ultrasound (1 pulse of 30 sec at 4°C). First-strand cDNA was 
synthesized from the fragmented RNA using a N6 randomized 
primer and M-MLV RNaseH-reverse transcriptase. 454 adapters 



A and B were ligated to the 5' and 3' ends of the cDNA. The 
cDNA was amplified with PCR using a proof reading enzyme. 
Normalization was carried out by one cycle of denaturation and 
reassociation of the cDNA, resulting in N 1 -cDNA. Reassociated 
ds-cDNA was separated from the remaining ss-cDNA (normalized 
cDNA) by passing the mixture over a hydroxylapatite column. 
After hydroxylapatite chromatography, the ss-cDNA was PCR 
amplified. For GS FLX Titanium sequencing, the cDNA in the 
size range of 450-700 bp was eluted from preparative agarose 
gels. The resulting cDNA was double stranded, and had a size of 
about 450-700 bp. The six samples from each species were pooled 
in equal amounts and pyrosequenced on a Roche GS FLX 
Titanium sequencer at the Norwegian Sequencing Center using 
the manufacturer's protocol. 

Sequence alignment and SNP mining 

The house sparrow reads were aligned and mapped against the 
zebra finch genome, and reads of both species were then mapped 
against the resulting contigs, providing a list of potential species- 
informative SNPs. Species-diagnostic SNPs from the twelve 
samples were chosen and subsequendy filtered for those without 
sufficient flanking sequence for PCR-primer design. Genes were 
annotated by blasting against the zebra finch and chicken 
genomes. Two exceptions were SNPs within CHDl^ and ND2 
genes, which were genotyped using existing primers [15]. The 
two CHD1£ SNPs in this initial set are within an intron [15]. 
The genomes of Passer sparrows have so far not been 
mapped. Genomic locations of the various markers were therefore 
inferred based on the Zebra finch Taeniopygia guttata genome 
[35]. 

Genotyping 

Multiplex sets of PCR primers were designed and all individuals 
genotyped at each SNP locus using the Sequenom MassARRAY 
system at CIGENE, Norwegian University of Life Sciences, As, 
Norway. A total of 124 putatively diagnostic SNP markers from 
107 different genes were genotyped successfully. Statistical 
analyses were carried out on a subset of 86 species informative 
SNPs after further filtering (Table S3). This involved removing 
SNPs with <97% genotyping success over all samples, plus 
removing all but one SNP from each gene (except in APC and 
A2ML1, in which two markers were included due to marked 
differences in parental allele frequencies). 

Detection of gene exchange 

Recent migration between Spanish sparrows on Sardinia and 
Italian sparrows was identified using the USEPOPINFO model 
in STRUCTURE [22], with 100,000 iterations, a burnin 
of 50,000, GENSBACK set to three generations and the 
rest of the settings as default. A hybrid index value [54] was 
calculated for each individual, based on the 86 SNPs. In the Alps 
transects, the presence of many individuals with intermediate 
hybrid index indicated hybridization with house sparrows (Figure 
SI). 

Detection of SNPs associated with reproductive isolation 

If genotype frequencies for individual SNPs change more 
sharply with changing hybrid index than a neutral expectation, 
this indicates a potential association with reproductive isolation. 
Examination of individual SNP cline width with respect to hybrid 
index was carried out using Bayesian genomic clines method as 
implemented in BGC [31]. Spanish sparrows from Badajoz and 
Gargano, and house sparrows from Oslo and Hradec Kralove, 
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were used to indicate parental genotype frequencies (Table S3). 
For the genomic clines analyses, all individuals from the Alps, the 
Italian peninsula (excluding Spanish sparrows from the southeast 
Italian sympatric zone), Sicily and Sardinia were pooled into one 
admixed population. Genotypes at the mitochondrial ND2 locus 
were coded as diploid homozygote as BGC failed to run with a 
haploid marker included. Three independent runs with 100,000 
iterations each were run with the first 25,000 iterations discarded 
as burnin, MCMC samples thinned by recording every fifth value, 
while the rest of the BGC settings were as default. SNPs were 
identified as significantly deviating from null expectations when 
the 95% credibility intervals of the cline parameters a and (5 did 
not cross zero. Once candidate SNPs were chosen using the 
genomic clines approach, geographic locations of sharp changes in 
allele frequency for each SNP were identified in GENELAND 
[32,33] using the uncorrelated allele frequency model and 
allowing 1-10 clusters. Each SNP was run three times at 3 million 
iterations with a burn-in of 200. For some SNPs Geneland 
identified more than two geographic clusters, indicating multiple 
rapid changes in allele frequency. In these cases the main cline was 
determined to be on a species boundary if Geneland identified a 
cline on that boundary and BGC indicated a significantly shifted 
cline center (significant Qt) (Figure 2 and Figure S2). We further 
narrow our focus primarily to markers that also have significandy 
steep clines (significant positive P) in all three BGC runs (Figure 2 
and Figure S2). 

Linkage disequilibrium 

On top of using the zebra finch genome to identify gene 
location, genetic linkage was estimated using GENEPOP [55] to 
calculate a P value for genotypic disequilibrium between every 
SNP pair and by employing a Fisher test to combine probabilities 
across all populations in the Italian peninsula, the Alps and 
Sardinia (Figure S4). 

Population divergence in sympatric Italian and Spanish 
sparrows 

To assess if there is introgression between sympatric Italian and 
Spanish sparrows, F &T values and genotypic differentiation were 
calculated between four populations in southeast Italy (Lesina, 
which is a sympatric population of Italian and Spanish sparrows, 
and the nearby allopatric Italian sparrow populations of Mass. 
Montanari and Guglionesi) in GENEPOP [55] using males and 
the chosen 86 SNPs. F ST estimates were calculated between i) the 
Italian sparrows from each population and ii) the Spanish 
sparrows in Lesina and the Italian sparrows in each of the three 
populations. A shift in ,F SX towards or away from Spanish sparrows 
in Italian sparrows from Lesina, relative to the two allopatric 
populations, would indicate introgression or displacement respec- 
tively. 

Outlier tests for divergent selection 

BAYESCAN [56] and LOSITAN [57] are softwares that 
implement methods to test for evidence of both divergent and 
balancing selection through F ST outlier analysis of molecular 
markers. We used both softwares to determine which of CHD1£ 
and CETN3, two highly linked markers, showed stronger evidence 
for selection in the Alps house-Italian sparrow hybrid zone. Data 
from all 85 nuclear SNPs (haploid mtDNA markers cannot be run 
alongside diploid markers, and ND2 is invariant in the Alps) were 
used and the three transects were pooled, excluding sites with just 
a single individual sampled. For LOSITAN the options 'neutral 
mean F ST ' and 'force mean F ST ' were chosen, along with the 



infinite alleles mutation model and 50 k simulations. For 
BAYESCAN, default settings were used. 

Ethics statement 

Handling of birds were conducted according to guidelines 
approved by the relevant authorities in the respective countries 
(Museum National d'Histoire Naturelle, Centre de Recherches sur 
la Biologie de Populations d'Oiseaux, Paris (France), Institute for 
Environmental Protection and Research - ISPRA (Italy), Con- 
sejeria de Industria, Energia y Medio Ambiente (Spain), Norwe- 
gian Food Safety Authority (Norway), Ministrstvo za okolje in 
proctor, Agencija Republike Slovenije za okolje (Slovenia) and 
Bundesamt fur Umwelt BAFU, Abteilung Artenmanagement 
(Switzerland)). 

Supporting Information 

Figure SI Individuals with intermediate crown color have 
already been reported in the Alps contact zone [14,16]. Further 
evidence of hybridization comes from a large proportion of 
individuals with hybrid index intermediate between house and 
Italian sparrows in the contact zone. Hybrid index for each 
individual is plotted against distance from the house sparrow end 
of each transect, along a) transect 1 (see Fig. 2), b) transect 2 and c) 
transect 3. d) Histogram of hybrid indices for populations at the 
house sparrow end of the three transects ('House'), the rest of the 
transect collections ('Hybrid zone'), and northern Italian popula- 
tions away from the transects ('Italian'). 
(DOC) 

Figure S2 Bayesian genomic clines for all 86 SNP markers. The 
legend in each panel indicates genomic location (chromosome 
number), SNP name and significance of cline center (a) and cline 
rate ((3) parameters for three independent BGC runs. Significant 
excess house sparrow ancestry is indicated by H's, significant 
excess Spanish sparrow ancestry is indicated by S's, significandy 
steep clines are indicated by +'s and significantly narrow clines are 
indicated by — 's. The number of H's, S's, +'s and — 's indicate for 
how many runs the parameter estimates for a given marker 
differed from neutral expectations. When none of the runs differed 
from neutral expectations, this is indicated by ns. Red panels 
indicate markers that exhibit significant excess house sparrow 
ancestry and have clines steeper than neutral expectations, where 
the difference in allele frequency between the parent species is 
greater than 0.5 and for which Geneland revealed a shift at the 
Italian-Spanish sparrow boundary. These are hence candidate RI 
genes. Blue panels indicate markers that exhibit significant excess 
Spanish sparrow ancestry, clines steeper than neutral expectations, 
a difference in allele frequency between the parent species greater 
than 0.5 and with a shift at the Italian-house sparrow boundary. 
These are hence candidate RI genes. Green panels indicate 
markers that have allele frequency differences between the parent 
species greater than 0.5 and where the clines are steeper than 
neutral expectations but do not exhibit a major allele frequency 
cline on either hybrid-parent species boundary. These markers are 
candidates for being incompatibilities within the Italian sparrow. 
Only markers with significant '+' on more than one BGC run are 
highlighted. 
(DOC) 

Figure S3 GENELAND geographic clines for Alps transects 1-3 
(Fig. 2 a), for the three loci exhibiting rapid changes in allele 
frequency coinciding with the hybrid zone between Italian and 
house sparrows (CHD1Z CETM3 and RPS4; see main text). Axes 
represent longitude (x axis) and latitude (y axis) in decimal degrees. 
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Colors refer to posterior likelihood of belonging to the group 
corresponding to the house sparrow (>0.9, white) relative to the 
Italian sparrow (<0.1, red). Black dots denote sampling locations. 
CHD1Z and RPS4 results for transect 2 are also represented in 
Fig. 2c. 
(DOC) 

Figure S4 Heat map of genotypic disequilibrium between Z- 
linked markers with significandy steep genomic clines. Marker 
positions in kbp are based on the zebra finch genome. Internal cell 
values indicate distances between genes in kbp. Cell colors 
represent genotypic disequilibrium, combining P values across all 
Italian peninsula and Sardinian populations. Dark blue P= 0; 
medium blue P<0.05; pale blue P<0.1. 
(DOC) 

Figure S5 _F ST -outlier analyses. (A) LOSITAN graphic of test for 
evidence of divergent selection in the Alps, using the 85 nuclear 
SNPs and combining data from all three transects (not including 
populations represented by a single individual). SNPs in the red 
area are candidates for positive divergent selection. At a false 
discovery rate of 0.05 only CHDl^ and RPS4 were significant 
outliers for divergent selection. (B) BAYESCAN graphic of test for 
evidence of divergent selection in the Alps, using the 85 nuclear 
SNPs and combining data from all three transects (not including 
populations represented by a single individual). The estimated 
alpha coefficient indicates the strength and direction of selection. 
A positive value of alpha suggests diversifying selection, whereas 
negative values suggest balancing or purifying selection. CHD1J? 
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